What are trendy names?


There are many ways to capture trendiness. A simple measure would be to look at the maximum number of births for a name, which is then adjusted to account for differences in the total number of births across years. This process of adjustment is called normalizing, and it allows for a fair comparison of the trendiness of names across different time periods. A trendy name would have a high value after being normalized in this way.

Transforming the data into a table that consists of the columns sex, name, nb_births_total, nb_births_max and trendiness. Then computing nb_births_total as the total number of births across all years, and nb_births_max as the maximum number of births for a given name across all years. Finally, computing trendiness as a ratio of these two numbers. Returns a plot with trends in the number of births based the name.

---
title: "Project 1: Exploring 100+ Years of US Baby Names"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
---

```{r setup, include = FALSE}
library(tidyverse)
library(flexdashboard)
FILE_NAME <- here::here("data/names.csv.gz")
tbl_names <- readr::read_csv(FILE_NAME, show_col_types = FALSE)
knitr::opts_chunk$set(
  fig.path = "img/",
  fig.retina = 2,
  fig.width = 6,
  fig.asp = 9/16,
  fig.pos = "t",
  fig.align = "center",
  # dpi = if (knitr::is_latex_output()) 72 else 150,
  out.width = "100%",
  # dev = "svg",
  dev.args = list(png = list(type = "cairo-png")),
  optipng = "-o1 -quiet"
)
ggplot2::theme_set(ggplot2::theme_gray(base_size = 8))
```

### What are the most popular baby names?

```{r results = "hide"}
# PASTE BELOW >> CODE FROM question-1-transform
tbl_names_popular = tbl_names |> 
  # Keep ROWS for year > 2010 and <= 2020
  filter(year > 2010, year <= 2020) |> 
  # Group by sex and name
  group_by(sex, name) |> 
  # Summarize the number of births
  summarize(
    nb_births = sum(nb_births),
    .groups = "drop"
  ) |> 
  # Group by sex 
  group_by(sex) |>  
  # For each sex, keep the top 5 rows by number of births
  slice_max(nb_births, n = 5)

tbl_names_popular
```


```{r}
# PASTE BELOW >> CODE FROM question-1-plot BELOW
tbl_names_popular |> 
  # Reorder the names by number of births
  mutate(name = fct_reorder(name, nb_births)) |>
  # Initialize a ggplot for name vs. nb_births
  ggplot(aes(x = nb_births, y = name)) +
  # Add a column plot layer
  geom_col() +
  # Facet the plots by sex
  facet_wrap(~ sex, scales = "free_y") +
  # Add labels (title, subtitle, caption, x, y)
  labs(
    title = 'Popular Baby Names 2011 - 2020',
    subtitle = 'Emma and Noah are the most popular names of the decade',
    caption = 'Source: SSA',
    x = '# of Births',
    y = 'Name'
  ) +
  # Fix the x-axis scale 
  scale_x_continuous(
    labels = scales::unit_format(scale = 1e-3, unit = "K"),
    expand = c(0, 0),
  ) +
  # Move the plot title to top left
  theme(
    plot.title.position = 'plot'
  )
```

***

<!-- Add a note to be included in the sidebar -->

To understand baby naming trends within the data collected from Social Security, start by figuring out the top five most popular male and female names for this 
decade (born 2011 and after, but before 2021). 

The first step is to transform the data into a form that is easy to visualize. A table was created with `sex`, `name` and `nb_births` (number of births) for the top 5 names for each `sex`. Then a horizontal bar plot of number of births by name faceted by sex was built to visualize this data. Therein we have the five most popular male and female names of the decade starting in 2011.



### What are trendy names?

```{r results = "hide"}
# PASTE BELOW >> CODE FROM question-2-transform 
tbl_names_popular_trendy = tbl_names |> 
  # Group by sex and name
  group_by(sex, name) |> 
  # Summarize total number of births and max births in a year
  summarize(
    nb_births_total = sum(nb_births),
    nb_births_max = max(nb_births),
    .groups = "drop"
  ) |> 
  # Filter for names with at least 10000 births
  filter(nb_births_total > 10000) |> 
  # Add a column for trendiness computed as ratio of max to total
  mutate(trendiness = nb_births_max / nb_births_total) |> 
  # Group by sex
  group_by(sex) |> 
  # Slice top 5 rows by trendiness for each group
  slice_max(trendiness, n = 5)

tbl_names_popular_trendy
```


```{r}
# PASTE BELOW >> CODE FROM question-2-visualize
plot_trends_in_name <- function(my_name) {
  tbl_names |> 
    # Filter for name = my_name
    filter(name == my_name) |> 
    # Initialize a ggplot of `nb_births` vs. `year` colored by `sex`
    ggplot(aes(x = year, y = nb_births, color = sex)) +
    # Add a line layer
    geom_line() +
    # Add labels (title, x, y)
    labs(
      title = glue::glue("Babies named {my_name} across the years!"),
      x = 'Years',
      y = '# of Births'
    ) +
    # Update plot theme
    theme(plot.title.position = "plot")
}
plot_trends_in_name("Steve")
plot_trends_in_name("Barbara")
```

***

<!-- Add a note to be included in the sidebar -->

There are many ways to capture trendiness. A simple measure would be to look at the maximum number of births for a name, which is then adjusted to account for differences in the total number of births across years. This process of adjustment is called normalizing, and it allows for a fair comparison of the trendiness of names across different time periods. A trendy name would have a high value after being normalized in this way.

Transforming the data into a table that consists of the columns `sex`, `name`, `nb_births_total`, `nb_births_max` and `trendiness`. Then computing `nb_births_total` as the total number of births across all years, and `nb_births_max` as the maximum number of births for a given name across all years. Finally, computing trendiness as a ratio of these two numbers. Returns a plot with trends in the number of births based the name.

### What makes certain letters more popular in names?

```{r results = "hide"}
# PASTE BELOW >> CODE FROM question-3-transform-1 and question-3-transform-2
tbl_names = tbl_names |> 
  # Add NEW column first_letter by extracting `first_letter` from name using `str_sub`
  mutate(first_letter = str_sub(name, 1, 1)) |>  
  # Add NEW column last_letter by extracting `last_letter` from name using `str_sub`
  mutate(last_letter = str_sub(name, -1, -1)) |> 
  # UPDATE column `last_letter` to upper case using `str_to_upper`
mutate(last_letter = str_to_upper(last_letter))

tbl_names

tbl_names_by_letter = tbl_names |> 
  # Group by year, sex and first_letter
  group_by(year, sex, first_letter) |> 
  # Summarize total number of births, drop the grouping
  summarize(nb_births = sum(nb_births), .groups = "drop") |> 
  # Group by year and sex
  group_by(year, sex) |> 
  # Add NEW column pct_births by dividing nb_births by sum(nb_births)
  mutate(pct_births = nb_births / sum(nb_births))
  
tbl_names_by_letter
```


```{r}
# PASTE BELOW >> CODE FROM question-3-visualize-1 and question-3-visualize-2
tbl_names_by_letter |> 
  # Filter for the year 2020
   filter(year == 2020) |>
  # Initialize a ggplot of pct_births vs. first_letter
  ggplot(aes(x = first_letter, y = pct_births)) +
  # Add a column layer using `geom_col()`
  geom_col() +
  # Facet wrap plot by sex
  facet_wrap(~ sex) +
  # Add labels (title, subtitle, x, y)
  labs(
      title = "Distrubition of Letters for Baby Names in 2020",
      subtitle = 'Illustrates the first letter most used for female is "A" and males names is "J"',
      x = 'First letter of Name',
      y = '% of Births'
    ) + 
  
  # Fix scales of y axis
  scale_y_continuous(
    expand = c(0, 0),
    labels = scales::percent_format(accuracy = 1L)
  ) +
  # Update plotting theme
  theme(
    plot.title.position = "plot",
    axis.ticks.x = element_blank(),
    panel.grid.major.x = element_blank()
  )


plot_trends_in_letter <- function(my_letter) {
  tbl_names_by_letter |> 
    # Filter for first_letter = my_letter
    filter(first_letter == my_letter) |> 
    # Initialize a ggplot of pct_births vs. year colored by sex
    ggplot(aes(x = year, y = pct_births, color = sex))+
    # Add a line layer
     geom_line() +
    # Add labels (title, subtitle, caption, x, y)
    labs(
      title = glue::glue("Trends in Names beginning with {my_letter}"),
      subtitle = "% of Births starting with a 'S' throughout the years",
      caption = "Source: SSA",
      x = "Years",
      y = '% of Births'
    ) +
    # Update y-axis scales to display percentages
    scale_y_continuous(labels = scales::percent_format()) +
    # Update theme
    theme(plot.title.position = "plot")
}

plot_trends_in_letter("S")
```

***

<!-- Add a note to be included in the sidebar -->
Have you ever wondered why some letters seem to be more prevalent in names than others? In this question, you will embark on a journey to uncover the reasons behind the popularity of specific letters in names. This investigation will lead you to interesting insights about how the popularity of letters in names has changed over time and the potential factors that have influenced these trends.

1. How have the first and last letters in names changed over the years by sex?
2. What are the trends in percentage of names with a given first or last letter across years.
3. What are the most popular combinations of first and last letters?

Let us start by transforming the data and adding two columns, one for `first_letter` and one for `last_letter`. Then compute the distribution of births across year and sex by first letter of a name. Visualize the distribution of births by first letter for the year 2020, faceted by sex. Lastly, plot trends in the percentage of births for all names starting with "S" as the first letter.


### What secrets do the most popular letter combinations hold?

```{r results = "hide"}
# PASTE BELOW >> CODE FROM question-4-transform
tbl_names_by_first_and_last_letter = tbl_names |> 
  # Filter for sex = "F"
  filter(sex == 'F') |>
  # Group by `first_letter`, `last_letter`, and `year`
  group_by(first_letter, last_letter, year) |>
  # Summarize total number of births
  summarize(nb_births = sum(nb_births), .groups = "drop") |>
  
  # Group by `year`
  group_by(year) |>
  # Add NEW column pct_births by dividing nb_births by sum(nb_births)
  mutate(pct_births = nb_births / sum(nb_births)) |>
  # Ungroup data
  ungroup()

tbl_names_by_first_and_last_letter
```


```{r}
# PASTE BELOW >> CODE FROM question-4-visualize
tbl_names_by_first_and_last_letter |> 
  # Filter for the year 2021
  filter(year == "2021") |>
  # Initialize a ggplot of last_letter vs. first_letter
  ggplot(aes(x = first_letter, y = last_letter)) +
  # Add a `geom_tile` layer with fill mapped to pct_births
  geom_tile(aes(fill = pct_births)) +
  # Add labels (title, subtitle, x, y, fill)
  labs(
      title = "Letter Distribution %",
      subtitle = "Heatmap of the percentage of births by first letter and last letter for the year 2021",
      x = "First Letter",
      y = "Last Letter",
      fill = "Percent of Births" ) +
  # Update fill scale to use Viridis colors
  scale_fill_viridis_b(direction = -1) +
  # Update plotting theme
  theme(
    plot.title.position = "plot",
    panel.grid = element_blank(),
    axis.ticks = element_blank()
  )
```

***

<!-- Add a note to be included in the sidebar -->
Are you ready to explore the fascinating realm of letter combinations in names? This question will guide you through the process of analyzing the joint distribution of births by first and last letters. By examining these intriguing patterns, you'll be able to unveil the most popular letter combinations and how they have evolved over the years. 

Visualize the distribution of `pct_births` by `last_letter` and `first_letter` by plotting a heatmap of the percentage of births by first letter and last letter for the year 2021.

### Are there naming trends in usage of vowels and consonants?

```{r results = "hide"}
# PASTE BELOW >> CODE FROM question-5-transform
get_letter_type <- function(letter) {
  VOWELS <- c("A", "E", "I", "O", "U")
  ifelse(letter %in% VOWELS, 'vowel', 'consonant')
}

tbl_names_vowel_consonant <- tbl_names |> 
  # Add NEW column named `first_letter_type`
  mutate(first_letter_type = get_letter_type(first_letter)) |>
  # Add NEW column named `last_letter_type`
  mutate(last_letter_type = get_letter_type(last_letter)) |>
  # Group by `sex`, `year`, `first_letter_type` and `last_letter_type`
  group_by(sex, year, first_letter_type, last_letter_type) |>
  # Summarize the total number of births
  summarize(nb_births = sum(nb_births), .groups = "drop") |>
  
  
  
  # Group by `sex` and` `year`
  group_by(sex, year) |>
  # Add NEW column with `pct_births` calculated as `nb_births / sum(nb_births)`
  mutate(pct_births = nb_births/sum(nb_births)) |>
  # Ungroup the data
  ungroup() |>
  # Unite `first_letter_type` and `last_letter_type` into a NEW column named `first_last`
   unite(first_last, first_letter_type, last_letter_type, sep = " " )

tbl_names_vowel_consonant
```


```{r}
# PASTE BELOW >> CODE FROM question-5-visualize
tbl_names_vowel_consonant |> 
  # Reorder `first_last` by the median `pct_births`
  mutate(first_last = fct_reorder(first_last, pct_births, median)) |>
  # Initialize a ggplot of `pct_births` vs. `year`
  ggplot(aes(x = year, y = pct_births)) +
  # Add an area layer with fill = first_last
  geom_area(aes(fill = first_last)) +
  # Facet wrap plot by `sex`
  facet_wrap(~ sex, scales = "free_y") +
  # Add labels (title, subtitle, caption, x, y)
  labs(
      title = " Trends in the usage of vowels and consonants in Baby names over time",
      subtitle = "% of births by the combination of first and last letter type 'per sex)",
      caption = "Source: SSA",
      x = "Years",
      y = '% of Births'
    ) +
  # Clean up x and y axis scales
  scale_x_continuous(
    expand = c(0, 0)
  ) +
  scale_y_continuous(
    expand = c(0, 0),
    labels = scales::percent_format()
  ) +
  # Use Viridis colors for fill
  scale_fill_viridis_d() +
  # Update plotting theme
  theme(
    plot.title.position = 'plot',
    legend.position = 'bottom'
  )
```

***

<!-- Add a note to be included in the sidebar -->

Do certain combinations of vowels and consonants tend to appear more often in names? Are there any notable changes in these patterns over the years? In this question, we'll explore the fascinating world of vowel and consonant usage in names across time. This can help us understand how the structure of names has evolved and what factors may have influenced these changes. By diving into these linguistic aspects, you'll gain a greater appreciation for the intricacies and diversity of names in our dataset. Let's dive in and uncover the trends in the usage of vowels and consonants!

In this step, you will create a function to identify whether a letter is a vowel or a consonant. Then, you will use this function to categorize the first and last letters in names as either vowels or consonants. After that, you will group the data by sex, year, and letter type (vowel or consonant) to calculate the percentage of births for each combination of first and last letter types.

Visualize to display the trends in the usage of vowels and consonants in names over time. The visualization will show the percentage of births by the combination of first and last letter types, separately for each sex.